Disaster Survival Guide in Petascale Computing: An Algorithmic Approach

نویسندگان

  • Jack J. Dongarra
  • Zizhong Chen
  • George Bosilca
  • Julien Langou
چکیده

1 Disaster Survival Guide in Petascale Computing: An Algorithmic Approach 3 Jack J. Dongarra, Zizhong Chen, George Bosilca, and Julien Langou 1.1 FT-MPI: A fault tolerant MPI implementation . . . . . . . . 6 1.1.1 FT-MPI Overview . . . . . . . . . . . . . . . . . . . . 6 1.1.2 FT-MPI: A Fault Tolerant MPI Implementation . . . 6 1.1.3 FT-MPI Usage . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Application Level Diskless Checkpointing . . . . . . . . . . . 8 1.2.1 Neighbor-Based Checkpointing . . . . . . . . . . . . . 10 1.2.2 Checksum-Based Checkpointing . . . . . . . . . . . . . 11 1.2.3 Weighted-Checksum-Based Checkpointing . . . . . . . 13 1.3 A Fault Survivable Iterative Equation Solver . . . . . . . . . 17 1.3.1 Preconditioned Conjugate Gradient Algorithm . . . . 17 1.3.2 Incorporating Fault Tolerance into PCG . . . . . . . . 18 1.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . 21 1.4.1 Performance of PCG with Different MPI Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.4.2 Performance Overhead of Taking Checkpoint . . . . . 22 1.4.3 Performance Overhead of Performing Recovery . . . . 24 1.4.4 Numerical Impact of Round-Off Errors in Recovery . . 26 1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . 28

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DAG-Based Software Frameworks for PDEs

The task-based approach to software and parallelism is well-known and has been proposed as a potential candidate, named the silver model, for exascale software. This approach is not yet widely used in the large-scale multi-core parallel computing of complex systems of partial differential equations. After surveying task-based approaches we investigate how well the Uintah software and an extensi...

متن کامل

Petascale algorithms for reactor hydrodynamics

We describe recent algorithmic developments that have enabled large eddy simulations of reactor flows on up to P = 65, 000 processors on the IBM BG/P at the Argonne Leadership Computing Facility.

متن کامل

Petascale Computing for Future Breakthroughs in Global Seismology

Will the advent of “petascale” computers be relevant to research in global seismic tomography? We illustrate here in detail two possible consequences of the expected leap in computing capability. First, being able to identify larger sets of differently regularized/parameterized solutions in shorter times will allow to evaluate their relative quality by more accurate statistical criteria than in...

متن کامل

Abstractions and Middleware for Petascale Computing and Beyond

As high-performance computing moves to the petascale and beyond, a number of algorithmic and software challenges need to be addressed. This paper reviews the main performance-limiting factors in today’s high-performance computing software and outlines a possible new programming paradigm to address them. The proposed paradigm is based on abstract parallel data structures and operations that enca...

متن کامل

A robust optimization model for distribution and evacuation in the disaster response phase

Natural disasters, such as earthquakes, affect thousands of people and can cause enormous financial loss. Therefore, an efficient response immediately following a natural disaster is vital to minimize the aforementioned negative effects. This research paper presents a network design model for humanitarian logistics which will assist in location and allocation decisions for multiple disaster per...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001